39 research outputs found

    Mapping microarray gene expression data into dissimilarity spaces for tumor classification

    Get PDF
    Microarray gene expression data sets usually contain a large number of genes, but a small number of samples. In this article, we present a two-stage classification model by combining feature selection with the dissimilarity-based representation paradigm. In the preprocessing stage, the ReliefF algorithm is used to generate a subset with a number of topranked genes; in the learning/classification stage, the samples represented by the previously selected genes are mapped into a dissimilarity space, which is then used to construct a classifier capable of separating the classes more easily than a feature-based model. The ultimate aim of this paper is not to find the best subset of genes, but to analyze the performance of the dissimilarity-based models by means of a comprehensive collection of experiments for the classification of microarray gene expression data. To this end, we compare the classification results of an artificial neural network, a support vector machine and the Fisher’s linear discriminant classifier built on the feature (gene) space with those on the dissimilarity space when varying the number of genes selected by ReliefF, using eight different microarray databases. The results show that the dissimilarity-based classifiers systematically outperform the feature-based models. In addition, classification through the proposed representation appears to be more robust (i.e. less sensitive to the number of genes) than that with the conventional feature-based representation

    Special Issue on Data Preprocessing in Pattern Recognition: Recent Progress, Trends and Applications

    Get PDF
    The availability of rich data sets from several sources poses new opportunities to develop pattern recognition systems in a diverse array of industry, government, health, and academic areas [...

    A Comparative Study of Simple Online Learning Strategies for Streaming Data

    Get PDF
    Since several years ago, the analysis of data streams has attracted considerably the attention in various research fields, such as databases systems and data mining. The continuous increase in volume of data and the high speed that they arrive to the systems challenge the computing systems to store, process and transmit. Furthermore, it has caused the development of new online learning strategies capable to predict the behavior of the streaming data. This paper compares three very simple learning methods applied to static data streams when we use the 1-Nearest Neighbor classifier, a linear discriminant, a quadratic classifier, a decision tree, and the Na¨ıve Bayes classifier. The three strategies have been taken from the literature. One of them includes a time-weighted strategy to remove obsolete objects from the reference set. The experiments were carried out on twelve real data sets. The aim of this experimental study is to establish the most suitable online learning model according to the performance of each classifie

    Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory

    Get PDF
    In general, gene expression microarrays consist of a vast number of genes and very few samples, which represents a critical challenge for disease prediction and diagnosis. This paper develops a two-stage algorithm that integrates feature selection and prediction by extending a type of hetero-associative neural networks. In the first level, the algorithm generates the associative memory, whereas the second level picks the most relevant genes.With the purpose of illustrating the applicability and efficiency of the method proposed here, we use four different gene expression microarray databases and compare their classification performance against that of other renowned classifiers built on the whole (original) feature (gene) space. The experimental results show that the two-stage hetero-associative memory is quite competitive with standard classification models regarding the overall accuracy, sensitivity and specificity. In addition, it also produces a significant decrease in computational efforts and an increase in the biological interpretability of microarrays because worthless (irrelevant and/or redundant) genes are discarded

    Experience in the development of a computer course for agrifood engineering

    Get PDF
    Comunicació presentada a INTED2018 12th International Technology, Education and Development Conference (Valencia, Spain. 5-7 March, 2018)The agrifood industry is one of the most economically relevant sectors. The Universitat Jaume I (UJI) of Castellón, Spain, is teaching the Bachelor’s Degree of Agrifood Engineering. The degree's aim is to equip students with the advanced knowledge, skills, and expertise to undertake technical and pro-duction management roles in the globally important agrifood sector. Ten professional profiles can be established for graduates in this degree. They range from those merely related to production to others involving information management, the environment and territorial organisation. The degree is composed by different courses. One of them is the computer science course. It ad-dresses the necessary basic knowledge and competencies to produce graduate with the expertise and competencies required to select, deploy, control and manage computer technology and data in the agrifood sector. The aim of this paper is to show the methodology and learning solutions used to teach the students how to use the information and communication technologies to improve competitiveness, productivity and sustainability in the agrofood sector. Paper shows the course objectives, the target competencies, the course contents, the assessments, and how the technology resources for e-learning are used to teach the subject

    Experiences In The Development Of A Computer Networking Course

    Get PDF
    ComunicaciĂł presentada a EDULEARN2019, 11th International Conference on Education and New Learning Technologies (July 1-3, 2019, Palma, Mallorca, Spain).The degree in Computational Mathematics Universitat Jaume I (UJI) of CastellĂłn, Spain, is a degree that offers a training that unites mathematics with computer science and focuses on areas where both are very relevant. The graduate in this degree have a mixed profile because it combines a theoretical training, typical of a pure degree in mathematics, with the technical training of a pure degree in computer science. This mixed profile make them very versatile, that is, with the ability to perform different types of tasks and have a high rate of employment. On the other hand, their capacity for analysis and abstraction will allow them to adapt very easily to any change and innovation. This degree is composed by different courses. One of these courses is the introduction to computer networking course. Computer networks allow its users to communicate and share information regardless of the geographic distance between them. Ensuring the optimal operation of computer networks is no easy task, and there is a necessity of highly trained professionals with a solid knowledge of network planning, design, and configuration, as well as the capacity to address the different and complex connectivity and security requirements. The computer networking course provides students with the fundamental concepts of computer networks along with training to acquire the basic skills necessary to solve the various technical problems of the world of computer networks. The aim of this paper is to show the methodology and learning solutions used to train the students to to plan flexible, scalable computer networking systems for a business or organization of virtually any size. Paper shows the course objectives, the target competencies, the course contents, the assessments, and how the technology resources for e-learning are used to teach the subject

    Information Systems Analysis And Design Course For Computer Engineering

    Get PDF
    Comunicació presentada a EDULEARN2019, 11th International Conference on Education and New Learning Technologies (July 1-3, 2019, Palma, Mallorca, Spain).Computer engineering is a branch of engineering that integrates several fields of computer science and electronics engineering required to develop computer hardware and software. The Universitat Jaume I (UJI) of Castellón, Spain, is teaching the Bachelor’s Degree of Computer Engineering. The degree’s aim is to train students with the abilities to design, implement and maintain computer systems for any sector of economic activity. These studies were in the past a five years pre-Bologna curriculum and now they are developed under the Bologna approach. Four professional profiles are established for graduates in this degree: Specialization track in information technologies, Specialization Track in Computer Engineering, Specialization Track in Information Systems and Specialization Track in Software Engineering. The degree is composed by different courses. One of these courses is the information systems analysis and design course. It addresses the necessary basic knowledge and competencies to produce graduate with the advanced knowledge, skills, expertise and competencies required to apply a systematic methodology for the analysis and design of information systems, together with the appropriate methods, techniques and tools. The aim of this paper is to show the methodology and learning solutions used to train the students to undertake information systems analysis and design in any kind of organizations. Paper shows the course objectives, the target competencies, the course contents, the assessments, and how the technology resources for e-learning are used to teach the subject

    One-Sided Prototype Selection on Class Imbalanced Dissimilarity Matrices

    Get PDF
    In the dissimilarity representation paradigm, several prototype selection methods have been used to cope with the topic of how to select a small representation set for generating a low-dimensional dissimilarity space. In addition, these methods have also been used to reduce the size of the dissimilarity matrix. However, these approaches assume a relatively balanced class distribution, which is grossly violated in many real-life problems. Often, the ratios of prior probabilities between classes are extremely skewed. In this paper, we study the use of renowned prototype selection methods adapted to the case of learning from an imbalanced dissimilarity matrix. More specifically, we propose the use of these methods to under-sample the majority class in the dissimilarity space. The experimental results demonstrate that the one-sided selection strategy performs better than the classical prototype selection methods applied over all classes

    Classification of high dimensional and imbalanced hyperspectral imagery data

    Get PDF
    The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA is applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of using together these two techniques, and also to evaluate the application order that leads to the best classification performance. Experimental results demonstrate the significance of combining these preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order of application corresponds to first a resampling algorithm and then PCA, this is a question that still needs a much more thorough investigationPartially supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596–C04–04/ESP and TIN2009–14205–C04–04, and by Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–0

    Gait-based Gender Classification Considering Resampling and Feature Selection

    Get PDF
    Two intrinsic data characteristics that arise in many domains are the class imbalance and the high dimensionality, which pose new challenges that should be addressed. When using gait for gender classification, benchmarking public databases and renowned gait representations lead to these two problems, but they have not been jointly studied in depth. This paper is a preliminary study that pursues to investigate the benefits of using several techniques to tackle the aforementioned problems either singly or in combination, and also to evaluate the order of application that leads to the best classification performance. Experimental results show the importance of jointly managing both problems for gait-based gender classification. In particular, it seems that the best strategy consists of applying resampling followed by feature selection
    corecore